65 research outputs found

    A Semantic and Syntactic Similarity Measure for Political Tweets

    Get PDF
    Measurement of the semantic and syntactic similarity of human utterances is essential in allowing machines to understand dialogue with users. However, human language is complex, and the semantic meaning of an utterance is usually dependent upon the context at a given time and learnt experience of the meaning of the words that are used. This is particularly challenging when automatically understanding the meaning of social media, such as tweets, which can contain non-standard language. Short Text Semantic Similarity measures can be adapted to measure the degree of similarity of a pair of tweets. This work presents a new Semantic and Syntactic Similarity Measure (TSSSM) for political tweets. The approach uses word embeddings to determine semantic similarity and extracts syntactic features to overcome the limitations of current measures which may miss identical sequences of words. A large dataset of tweets focusing on the political domain were collected, pre-processed and used to train the word embedding model, with various experiments performed to determine the optimal model and parameters. A selection of tweet pairs were evaluated by humans for semantic equivalence and correlated against the measure. The new measure can be used in a variety of applications, including for identifying and analyzing political narratives. Experiments on three diverse human-labelled test datasets demonstrate that the measure outperforms an existing measure, performs well on tweets from the political domain and may also generalize outside the political domain

    Peoples Panel for Artificial Intelligence

    Get PDF
    Citizen trust in Artificial Intelligence (AI) applications and data driven technologies is at the forefront of ethical guidelines, principles, and future AI legalisation. The creation of successful products and services which benefit people and society requires many diverse citizen voices which are often absent from R&D processes and wider public discourse. Citizens need to have the opportunity and confidence to engage with researchers and innovators through a shared language, understanding and relationship between data and AI. Through funding obtained from the Alan Turing Institute, the UK’s national institute for data science and artificial intelligence (AI), through its Public Engagement Grant award 2022 we established a Greater Manchester (GM) People’s Panel for AI (PPfAI) to empower marginalised communities to contribute to AI research and development. We reached out to two communities: The Tatton in Salford and Inspire in Levenshulme and neither community had previously engaged with the research and development sector previously. A key motivation for this project was to build people’s confidence to ask questions about how their data and AI is being used by businesses through an increased understanding of what AI is and how it is used, Starting in July 2022, we ran 3 community interactive roadshows to explore how AI impacted people’s everyday lives, debating technology, exploring a range of applications, and obtaining very diverse opinions. 9 citizens were recruited to the People’s Panel and completed two days of training about key aspects of data, AI and ethics, including learning the Open Data Institutes Consequence Scanning toolkit. Four live People’s Panel sessions were held where tech businesses and researchers pitched their ideas and were subject to intensive questioning by panel members. To sustain the panel, we have developed with panel members, businesses and the Greater Manchester Equality Alliance (GM=EqAl) (which works with marginalised communities to influence regional policy making) to develop a Peoples Panel for AI Terms of Reference which is freely available. After taking part, panel members reported an increase in confidence in being able to question businesses and researchers. Businesses heard a diverse stakeholder voice on the ethical impacts of their products / services which have and are leading to many changes from product design considerations to ethical practices

    A methodology for the resolution of cashtag collisions on Twitter – A natural language processing & data fusion approach

    Get PDF
    Investors utilise social media such as Twitter as a means of sharing news surrounding financials stocks listed on international stock exchanges. Company ticker symbols are used to uniquely identify companies listed on stock exchanges and can be embedded within tweets to create clickable hyperlinks referred to as cashtags, allowing investors to associate their tweets with specific companies. The main limitation is that identical ticker symbols are present on exchanges all over the world, and when searching for such cashtags on Twitter, a stream of tweets is returned which match any company in which the cashtag refers to - we refer to this as a cashtag collision. The presence of colliding cashtags could sow confusion for investors seeking news regarding a specific company. A resolution to this issue would benefit investors who rely on the speediness of tweets for financial information, saving them precious time. We propose a methodology to resolve this problem which combines Natural Language Processing and Data Fusion to construct company-specific corpora to aid in the detection and resolution of colliding cashtags, so that tweets can be classified as being related to a specific stock exchange or not. Supervised machine learning classifiers are trained twice on each tweet – once on a count vectorisation of the tweet text, and again with the assistance of features contained in the company-specific corpora. We validate the cashtag collision methodology by carrying out an experiment involving companies listed on the London Stock Exchange. Results show that several machine learning classifiers benefit from the use of the custom corpora, yielding higher classification accuracy in the prediction and resolution of colliding cashtags

    Modelling Road Congestion using a Fuzzy System and Real-World Data for Connected and Autonomous Vehicles

    Get PDF
    Road congestion is estimated to cost the United Kingdom £307 billion by 2030. Furthermore, congestion contributes enormously to damaging the environment and people’s health. In an attempt to combat the damage congestion is causing, new technologies are being developed, such as intelligent infrastructures and smart vehicles. The aim of this study is to develop a fuzzy system that can classify congestion using a real-world dataset referred to as Manchester Urban Congestion Dataset, which contains data similar to that collected by connected and autonomous vehicles. A set of fuzzy membership functions and rules were developed using a road congestion ontology and in conjunction with domain experts. Experiments are conducted to evaluate the fuzzy system in terms of its precision and recall in classifying congestion. Comparisons are made in terms of performance with traditional classification algorithms decision trees and Naïve Bayes using the Red, Amber, and Green classification methods currently implemented by Transport for Greater Manchester to label the dataset. The results have shown the fuzzy system has the ability to predict road congestion using volume and journey time, outperforming both decision trees and Naïve Bayes

    An investigation into fuzzy negation in semantic similarity measures

    Get PDF
    Machine computation of semantic similarity between short texts aims to approximate human measurements of similarity, often influenced by context, domain knowledge, and life experiences. Logical negation in natural language plays an important role as it can change the polarity of meaning within a sentence, yet it is a complex problem for semantic similarity measures to identify and measure. This paper investigates the impact of logical negation on determining fuzzy semantic similarity between short texts containing fuzzy words. A methodology is proposed to interpret the implications of a negation word on a fuzzy word within the context of a user utterance. Three known fuzzy logical not operators proposed by Zadeh, Yager and Sugeno are incorporated into a fuzzy semantic similarity measure called FUSE. Experiments are conducted on a sample dataset of short text inputs captured through human engagement with a dialogue system. Results show that Yager's weighted operator is the most suitable for achieving a matching threshold of 90.47% accuracy. This finding has significant implications for the field of semantic similarity measures. It provides a more accurate way to measure the similarity of short texts that contain fuzzy words combined with logical negation. Whilst validation of the approach on more substantial datasets is required, this study contributes to a better understanding of how to account for logical negation in fuzzy semantic similarity measures and provides a valuable methodology for future research in this area

    Fuzzy Influence in Fuzzy Semantic Similarity Measures

    Get PDF
    The field of Computing with Words has been pivotal in the development of fuzzy semantic similarity measures. Fuzzy semantic similarity measures allow the modelling of words in a given context with a tolerance for the imprecise nature of human perceptions. In this work, we look at how this imprecision can be addressed with the use of fuzzy semantic similarity measures in the field of natural language processing. A fuzzy influence factor is introduced into an existing measure known as FUSE. FUSE computes the similarity between two short texts based on weighted syntactic and semantic components in order to address the issue of comparing fuzzy words that exist in different word categories. A series of empirical experiments investigates the effect of introducing a fuzzy influence factor into FUSE across a number of short text datasets. Comparisons with other similarity measures demonstrates that the fuzzy influence factor has a positive effect in improving the correlation of machine similarity judgments with similarity judgments of humans

    Trust in Computational Intelligence Systems: A Case Study in Public Perceptions

    Get PDF
    The public debate and discussion about trust in Computational Intelligence (CI) systems is not new, but a topic that has seen a recent rise. This is mainly due to the explosion of technological innovations that have been brought to the attention of the public, from lab to reality usually through media reporting. This growth in the public attention was further compounded by the 2018 GDPR legislation and new laws regarding the right to explainable systems, such as the use of “accurate data”, “clear logic” and the “use of appropriate mathematical and statistical procedures for profiling”. Therefore, trust is not just a topic for debate – it must be addressed from the onset, through the selection of fundamental machine learning processes that are used to create models embedded within autonomous decision-making systems, to the selection of training, validation and testing data. This paper presents current work on trust in the field of Computational Intelligence systems and discusses the legal framework we should ascribe to trust in CI systems. A case study examining current public perceptions of recent CI inspired technologies which took part at a national science festival is presented with some surprising results. Finally, we look at current research underway that is aiming to increase trust in Computational Intelligent systems and we identify a clear educational gap

    A Heuristic Based Pre-processing Methodology for Short Text Similarity Measures in Microblogs

    Get PDF
    Short text similarity measures have lots of applications in online social networks (OSN), as they are being integrated in machine learning algorithms. However, the data quality is a major challenge in most OSNs, particularly Twitter. The sparse, ambiguous, informal, and unstructured nature of the medium impose difficulties to capture the underlying semantics of the text. Therefore, text pre-processing is a crucial phase in similarity identification applications, such as clustering and classification. This is because selecting the appropriate data processing methods contributes to the increase in correlations of the similarity measure. This research proposes a novel heuristicdriven pre-processing methodology for enhancing the performance of similarity measures in the context of Twitter tweets. The components of the proposed pre-processing methodology are discussed and evaluated on an annotated dataset that was published as part of SemEval-2014 shared task. An experimental analysis was conducted using the cosine angle as a similarity measure to assess the effect of our method against a baseline (C-Method). Experimental results indicate that our approach outperforms the baseline in terms of correlations and error rates

    Using Fuzzy Set Similarity in Sentence Similarity Measures

    Get PDF
    Sentence similarity measures the similarity between two blocks of text. A semantic similarity measure between individual pairs of words, each taken from the two blocks of text, has been used in STASIS. Word similarity is measured based on the distance between the words in the WordNet ontology. If the vague words, referred to as fuzzy words, are not found in WordNet, their semantic similarity cannot be used in the sentence similarity measure. FAST and FUSE transform these vague words into fuzzy set representations, type-1 and type-2 respectively, to create ontological structures where the same semantic similarity measure used in WordNet can then be used. This paper investigates eliminating the process of building an ontology with the fuzzy words and instead directly using fuzzy set similarity measures between the fuzzy words in the task of sentence similarity measurement. Their performance is evaluated based on their correlation with human judgments of sentence similarity. In addition, statistical tests showed there is not any significant difference in the sentence similarity values produced using fuzzy set similarity measures between fuzzy sets representing fuzzy words and using FAST semantic similarity within ontologies representing fuzzy words

    An Empirical Performance Evaluation of Semantic-Based Similarity Measures in Microblogging Social Media

    Get PDF
    Measuring textual semantic similarity has been a subject of intense discussion in NLP and AI for many years. A new area of research has emerged that applies semantic similarity measures within Twitter. However, the development of these measures for the semantic analysis of tweets imposes fundamental challenges. The sparsity, ambiguity, and informality present in social media are hampering the performance of traditional textual similarity measures as “tweets”, have special syntactic and semantic characteristics. This paper reviews and evaluates the performance of topological, statistical, and hybrid similarity measures, in the context of Twitter analysis. Furthermore, the performance of each measure is compared against a naïve keyword-based similarity computation method to assess the significance of semantic computation in capturing the meaning in tweets. An experiment is designed and conducted to evaluate the different measures through examining various metrics, including correlation, error rates, and statistical tests on a benchmark dataset. The potential weaknesses of semantic similarity measures in relation to Twitter applications of textual similarity assessment and the research contributions are discussed. This research highlights challenges and potential improvement areas for the semantic similarity of tweets, a resource for researchers and practitioners
    • …
    corecore